An Efficient Approach to Mining Frequent Itemsets on Data Streams
نویسنده
چکیده
The increasing importance of data stream arising in a wide range of advanced applications has led to the extensive study of mining frequent patterns. Mining data streams poses many new challenges amongst which are the one-scan nature, the unbounded memory requirement and the high arrival rate of data streams. In this paper, we propose a new approach for mining itemsets on data stream. Our approach SFIDS has been developed based on FIDS algorithm. The main attempts were to keep some advantages of the previous approach and resolve some of its drawbacks, and consequently to improve run time and memory consumption. Our approach has the following advantages: using a data structure similar to lattice for keeping frequent itemsets, separating regions from each other with deleting common nodes that results in a decrease in search space, memory consumption and run time; and Finally, considering CPU constraint, with increasing arrival rate of data that result in overloading system, SFIDS automatically detect this situation and discard some of unprocessing data. We guarantee that error of results is bounded to user pre-specified threshold, based on a probability technique. Final results show that SFIDS algorithm could attain about 50% run time improvement than FIDS approach. Keywords—Data stream, frequent itemset, stream mining.
منابع مشابه
An Efficient Algorithm to Mine Online Data Streams
Mining frequent closed itemsets provides complete and condensed information for non-redundant association rules generation. Extensive studies have been done on mining frequent closed itemsets, but they are mainly intended for traditional transaction databases and thus do not take data stream characteristics into consideration. In this paper, we propose a novel approach for mining closed frequen...
متن کاملCLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data
Recently, frequent itemsets mining over data streams attracted much attention. However, mining closed itemsets from data stream has not been well addressed. The main difficulty lies in its high complexity of maintenance aroused by the exact model definition of closed itemsets and the dynamic changing of data streams. In data stream scenario, it is sufficient to mining only approximated frequent...
متن کاملAn Efficient Algorithm for Maintaining Frequent Closed Itemsets over Data Stream
Data mining refers to the process of revealing unknown and potentially useful information from a large database. Frequent itemsets mining is one of the foundational problems in data mining, which is to discover the set of products that purchased frequently together by customers from a transaction database. However, there may be a large number of patterns generated from database, and many of the...
متن کاملIncremental updates of closed frequent itemsets over continuous data streams
Online mining of closed frequent itemsets over streaming data is one of the most important issues in mining data streams. In this paper, we propose an efficient one-pass algorithm, NewMoment to maintain the set of closed frequent itemsets in data streams with a transaction-sensitive sliding window. An effective bit-sequence representation of items is used in the proposed algorithm to reduce the...
متن کاملTop-k-FCI: Mining Top-K Frequent Closed Itemsets in Data Streams
With the generation and analysis of stream data, such as network monitoring in real time, log records, click streams, a great deal of attention has been concerned on data streams mining in the field of data mining. In the process of the data streams mining, it is more reasonable to ask users to set a bound on the result size. Therefore, in this paper, an real-time single-pass algorithm, called ...
متن کاملMining Recent Frequent Itemsets in Sliding Windows over Data Streams
This paper considers the problem of mining recent frequent itemsets over data streams. As the data grows without limit at a rapid rate, it is hard to track the new changes of frequent itemsets over data streams. We propose an efficient one-pass algorithm in sliding windows over data streams with an error bound guarantee. This algorithm does not need to refer to obsolete transactions when 316 C....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009